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This document gives a technical description of a multirchannel parametric 
audio coding system as developed by Philips. The goal of this system is to describe an m- 
channel signal by an n-channel signal, with n<m, and parameters describing a spatial image 
in order to reconstruct the mrchannel signal. Although the techniques described in this 
document could be extended to coding any m to any n channels, the embodiments described 
in this document is to provide a technical description of coding 5(.l) to 2 or 5(.l) to 1 channel 
coding. The extension ".1" denotes the presence of an LEE channel. Furthermore, it is 
assumed that when reproducing the multi-channel signals, a typical loudspeaker setup is used 
consisting of a Left front (Lf), a Right front (Rf), a Centre (Cf), a Left surround (Ls), a Right 
surround (Rs) and optionally a low-frequency effects (LFE) speaker. 

fflGH LEVEL DESCRIPTION 

Figure 1 shows a general block diagram of the multi-channel parametric 
encoder. The multichannel input signal consisting of the five channels Lf, Rf, Cf, Ls, Rs and 
tiie optional LFE channel are analyzed resulting in a set of parameters describing the spatial 
image. Depending on the configuration eitiier a mono down- mis channel M is generated or a 
stereo down mix consisting of the left and right channels Ld and Rd is generated. This mono 
(M) or stereo (Ld, Rd) signal is then encoded using a conventional mono or stereo audio 
encoder respectively. The bit stream resulting from tiiis encoding process is merged with a bit 
stream derived from the coded spatial parameters preferably in a backwards compatible 
fashion. 
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A bbck diagram of the corresponding multi-channel parametric decoder is 
given in Figure 2. First the bit stream is de- multiplexed resulting in a (backwards compatible) 
bit stream for the mono or stereo audio decoder and a spatial parameter bit-stream. The mono 
or stereo decoder then reconstructs the coded mono down- mix signal M' or the stereo down- 
mix signal (Ld',Rd') respectively. Concurrently, the spatial parameters are decoded. Finally 
the multi-channel signal is reconstructed by imposing the spatial parameters onto the down- 
mix channel(s). 
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DETAILED TECHNICAL DESCRIPTION 
Time/frequency transform 

The multichannel analysis and reconstruction blocks require spatial analysis 
and synthesis to be performed on individual time/frequency tiles. Therefore, a time/frequency 
transform is required with the following prerequisites: 

The tiransform is preferably complex, to enable measurement and modification 
of (relative) phase values between input and output channels; 

The transform should be oversampled, to avoid ahasing distortion which 
would result from time and frequency dependent changes in a critically- sampled system; 
The frequency resolution should be nort-uniform according to the frequency resolution of tiie 
human auditory system; 

The time resolution is generally rather low, except in the case of transients. 

A generalized block diagram of a spatial encoder is shown in Figure 3. A 
multirchannel input signal is first transformed to the frequency domain. Subsequentiy, a 
downmix and spatial parameters are generated. The downmixed signals are subsequentiy 
fransformed to the time domain. The decoder basically performs flie inverse process. 



PHNL040388EPQ 



3 



02.04.2004 



10 



15 



20 



OS 

c: 

CO - 



T/F 




Processing 
stage 




LL. 










LL 




P 









"co 

"3 

a, 

O 



Spatial parameters 
Figure 3 : Block diagram of generalized encoder processing stage. 

Currently, two time/frequency transforms are used which meet the 
prerequisites mentioned above. The first transform comprises time-domain segmentation 
followed by FFTs, All input signals are segmented and transformed by means of the Discrete 
Fourier Transform (DFT): 

IV/ 2 / 2%kn\ 

with x^ln\ the time domain input signal, h^[n] the analysis window, / the frame index, h 
the frame update in samples, N the DFT length and [fe] the DFT with frequency index 
A: . At the output of the multi-channel encoder/decoder the processed signals are 
transformed back to the time domain by means of an inverse DFT: 



r N-i 



2%kn 
N 



25 



y,^ [n] = 2 • K [n] • Y,, [fc] • exp |^ j 

with Yij\K\ the (zero-padded) DFT representation, the time-domain segment 

corresponding to frame / and AJn] the synthesis window. The resulting output signal yXn\ 
is obtained by overlap-add of the segments yij\n\ . 

Th.e second transform which is of particular interest for memory and 
computational complexity reasons is a complex- exponential modulated filter bank. 

In the following sections, it is assumed that individual time/frequency tiles of 
all input channels are available for processing at both the encoder and decoder side. 

System based on Stereo downmix 
Encoder 

The aim of this encoder is to represent a 5.1-chaimel input signal as a 
backwards-compati.ble stereo signal (i.e., with a spatial representation which resembles the 
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5.1 channel reconstruction as good as possible), combined with spatial parameters that enable 
reconstruction of a 5,1-channel output that resembles the original 5.1 input signals from a 
perceptual point of view. The structure of this system is depicted in Fig, 4. 
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The encoder consists of three parallel stages which aU convert a stereo input 
signal to a mono signal, combined with parameter extraction which represents the spatial 
cues between the respective input signals. Each of these three parallel blocks computes 
(assuming input signals Xi and X^, and output signal 1^): 

The ratio of the powers of corresponding time/frequency tiles of the input 
signals (which will be denoted 'Interchannel Level Difference', or ILD), given by: 



ILD =101ogl0 



J2 x^[k]x;m 



The average phase difference (or the phase difference which maximizes the 
correlation between the input signals), which is referred to as 'haterchaimel Phase 
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Difference', or IPD, given by: 



IPD =Z 



The coherence (ICC), which is the normalized cross-correlation between the 
input signals given the IPD value as in (2): 



ICC = 



V 



\ k k J 



Each parallel processing block generates a parameter set which comprises all 
or a selected number of these parameters for each time/frequency tile (depending on the 
desired parameter bitrate, some parameters are not transmitted). Besides parameters, each 
10 parallel stage also computes a single output signal, which is a linear (complex) combination 
of the two input signals: 

with wi and wz complex weights which depend on the extracted parameters (wi = f (IID, IPD, 
ICC)). Preferably each time/frequency tile of has a power that is equal to the sum of the 
15 powers of the input signals Xi and X2. 

A fourth parameter which is required for reconstruction of phase differences in 
the encoder is the Overall Phase Difference (or OPD), which is defined as the average phase 
difference between the first input signal and the mono output signal. 

The next stage of the encoder performs a down- mix firom three (L, C, R) to 
20 two down-mdx channels (Lo, Ro), combined with corresponding spatial parameters. Each 
down-mix channel is a Unear combination of the input signals L, R and C: 
L^m =a,L[/:]+a2i?[fc]+a3a*:], 

The parameters and p,- are chosen such that a good stereo image of the 
stereo signal consisting of LoM and 2?bM is obtained. One of tiie prerequisites for a good 

25 stereo image is that as equals 63. 

At the decoder, channels L, R and C are predicted using tiie two down- mix 

channels Lo and Ro as follows: 
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C[k] = C^^L,[k] + C^,,R,[k]. 

To this end, parameters Ci,z and C2,z (for Z = L, R or C) are computed at the 
encoder and sent to the decoder. 

A minimal Euclidian norm of the difference of signal Z[A:] and its estimation 
Z[k] is used as optimization criterion to find the parameters Ci.z and Ca^;. The square of the 
Euclidian norm of the difference of the original input channel Z[k] and its estimation at the 
decoder Zlk] can be written as: 

£|Z[^]-Z[d', 

k ' 

with 

il 2 
Z[k] -Zlki leads to the following expressions: 

IKMiril^o[^]|r -|< L,m,Roik^ >|' 
^ _ < Rom,Z[k-\ > Whlkj- < Lo[A:],Z[fc] >*< LqCA:], jgpW > 

with 

k 

k 

For the coefficients Ci,z and Cz^z the followiQg relations can be derived: 

Having 6 variables (Ci.l ,C2,l ,Ci.r C2.R ,Ci.g and C2,c) and at the same time 4 
relations between these parameters, only 2 parameters need to be sent to the decoder. Ci,l 
(henceforth notated by P) and C2,r (henceforth notated by y) are transmitted to the decoder 
because of their similar statistical distributions. 
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Before the actual downmix is appUed (i.e., L, C and R are combined to Lo and 
Ro), the signals L, C and R are preferably pre-conditioned by first applying a phase shift to L, 
R, and/or C to assure that the signal pair L and C on the one hand and R and C on the other 
hand have a non- negative correlation to minimize energy loss by the downmix process. The 
applied phase shifts can be transmitted without any bit-rate costs by superimposing these 
phase shifts on the OPD values resulting fixjm the individual two-to-one stages. 

If no LFE channel is present at the encoder input, the UPE channel can be 
considered as containing zeros only and aU related processing steps can be discarded. In that 
case, parameter set 2 as shown in Figure 4 is irrelevant and does not have to be transmitted. 



Decoder 

The decoder basically performs the reverse process as depicted in Figure 4. 
In a first stage, the stereo input signal (Lo, Ro) is converted to a thre^channel signal (L, C, 
R) based on parameter set 4. The upmix, based on the transmitted parameters 6 and y, is 
performed as foUows: 
L[fc]=pi^[fe]+(Y-l)i?oW 

cm = a- P)^oW + (1 -y)Rom 

If a 3-channel reconstruction is desired, the decoding process is finished. 

For e.g. a 3.1, 5 or 5.1-channel reconstruction, the mono signals L, C, and. R 
are subdivided into two-channel signals, based on the spatial parameters corresponding to 
each signal. The general structure of the mono-to-stereo upmix block is given in Figure 5. 
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Figure 5: Generalized structure of the mono-to-stereo upmix stage. 



A local copy of the mono input signal X is fed through a decorrelation filler H. 
This filter can be implemented as a (frequencj^ dependent) delay or reverberation module. 
The mono input signal X and its decorrelated copy Q are subsequently combined in a mixing 
stage to form the stereo output signal (Ju YzY 
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cosCr) 0 Txik] 
. 0 siD(x)lQik] 



X = arctan 



a = — arctan 
2 



C IglCC ^ 



(1 = 1 + 

I- 



v 

4/CC^-4 



If desired, this process is repeated for all three signals L, C, and R, resulting in 
a 5.1 (Lf, Rf, Ls, Rs, Cf, LEE) signal. It should be noted that the decoirelation filter H can be 
different for different mono-to-stereo processing blocks. 



Enhancement methods 

The backwards-compatible stereo downmix will usually result in a 
significantly-reduced spatial image compared to tibe original 5-channel reconstruction. It is 
possible to enhance the stereo downmix in such a way that the perceived spatial image of the 
downmbc resembles the 5-channel reconstruction more closely by introdudng virtual 
surround loudspeakers (or sometimes denoted as 'stereo widening' algorithm). The 
generation of virtual surround loudspeakers is based on cross-talk canceUation principles. 
Various methods for stereo-widening are currentiy available. However, these systems have 
an important drawback: if they are applied on a stereo downmix they result in a widening of 
the complete sound stage instead of a widening of the surround signals only. To overcome 
this drawback, we propose a cross- talk cancellation algorithm that is applied on the stereo 
downmix, in which the amount of cross-talk cancellation depends on the spatial encoding 
parameters. Using this method, (1) only signal parts that would have been reproduced by 
surround loudspeakers in a 5-channel setup are processed, and (2) the cross-talk canceUation 
algorithm can be inverted, which is a requirement for high-qualily 5-channel reconstruction. 
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System based on mono-downmix 

Approach 1 

Encoder 

A straightforward approach for a 5.1-to-l encoder is depicted in Figure 6. The 
encoder consists of subsequent stereo-to-mono analysis blocks, which are identical to those 
used in Figure 4, 
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Figure 6: Structure of 5.1-to-one encoder. 



This approach reduces the number of input channels pair-wise using stereo-to- 
mono blocks. Each block generates a mono signal and spatial parameters, of which a 
selection is transmitted. A recommended parameter selection includes: 



Parameter set 1: 

IID, ICC for each time/frequency tile; 

IPD, OPD for time/frequency tiles up to about 2 kHz. 

Parameter set 2: 

IE), ICC for each time/frequency tile; 

IPD, OPD for time/frequency tiles up to about 2 kHz. 
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Parameter set 3: 

IID only for time/frequency tiles up to about 150 Hz. 
Parameter set 4: 

HD, ICC for each time/frequency tile; 

Optionally: IPD, OPD for time/frequency tiles up to about 2 kHz. 

Parameter set 5: 

IID for each time/frequency tile. 

Decoder 

A corresponding decoder consists of the reverse process as depicted in Figure 6. The 
individual building blocks are equal to those described in detail in Section 0. 

Approach 2 

Instead of perforaung a spatial analysis and downmix on a pair- wise basis, this approach 
performs flie downmix in a single stage based on multidimensional Principal Component 
Analysis (PCA). Assuming 5 complex- valued input signals the complex- valued 
covariance matrix Sy of the incoming signals is given by: 

k 

Note that Sy = S/ . In principle the non-diagonal elements of the covariance matrix (i.e., i not 
equal to j) are complex. However, it is possible to extract and transmit the complex angles of 
Sij separately, resulting in a covariance matrix containing elements equal or larger than zero 
only. We will describe a real- valued covariance matrix from this point on. 

It is assumed that the input signals Xi consist of a rotation (R) of a set of 
orthogonal signals Yj (i.e., R'^ = R"^) 
X = RY . 

Given the fact that the individual signals of Yi are orthogonal, the covariance 
matrix of Y, Sy, is diagonal. Consequently, the covariance matrix of X, Sx, can be written as: 
= SS yR^ . 

Given the diagonal matrix Sy, the above e35)ression equals the 
eigenvalue/eigenvector decomposition of Sx. This means that the rotation matrix R and the 
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eigenvalues (which are equal to the energies of the orthogonal signals of Y) can be obtained 
by an eigenvalue/eigenvector decomposition of the cowiance matrix, Sx. 

The general encoder approach then consists of: 
computing and applying (relative) phase parameters of the input signals in order to result in a 
covariance matrix which contains smaller imaginary parts; a possible technique is applying 
complex principal component analysis; 

computing the eigenvalue/eigenvector decomposition of the complex covariance matrix, 

resulting from the original or modified input signals; 

(inverse) rotation of the input signals to obtain the principal components; 

transmission of the first principal component only (i.e., the component corresponding to the 

largest eigenvalue) as mono output signal 
transmission of: 

either the rotation matrix R (e.g. in terms of its angular components) and the (relative) 
powers of each principal component, or 

the (real- or complex valued) covariance matrioc itself in terms of IPDs, correlations and 
power ratios; 

the relative complex phase angles; 

preferably an overall phase shift of the principal component signal. 

The decodea: consists of the following steps: 
compuling the rotation matrix R (either from the direct transnoission or an eigenvalue 
decomposition of the transmitted covariance matrix; 

computing of a set of orthogonal signals by means of decorrelating the mono input signal; 
scaling the orthogonal signals by means of the transmitted (relative) signal powers; 
generation of a five-channel output by rotating the orthogonal signals; 
re- instating the complex phase relationships based on transmitted (relative, IPD and overaU, 
OPD) phase information. 

Approach 3 

The five-dimensional PCA as described as Approach 2 is relatively complex compared to the 
cascaded approach described in Approach 1 especially since it is not possible to derive the 
eigenvectors analytically. As a compromise between both approaches the cascaded approach 
could be combined with three-dimensional PCA. This is illustrated in Figure 7. 
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Figure 7: Hybrid encoder approach. 



The front (F), surround (S) and centre (C) channels are derived similar to 
Approach 1. Consequently these signals are used as input to a three- dimensional PCA 
procedure resulting in a 3x3 rotation matrix R. The same procedure for encoding and 
decoding as described in Approach 2 can be used. 

Alternatively, the three channels input to the three- dimensional PCA 
procedure could consist of 
Downmix of Lf and Ls 
Downmix of Rf and Rs 
Downmix of Cf and LEE 



Residual-coding extensions 

Most encoder processing blocks reduce the number of input channels, 
combined with a parameterization of the relations between the original input channels. In 
principle, each reduction in number of channels results in an output signal, but also in a 
complementary residual signal (or more residual signals), which would in principle allow 
perfect reconstruction of the original input signals. In the decoder, these residual signals are 
artificiaUy regenerated by decorrelation filters. It should be noted, however, that it can occur 
for certain time/frequency tiles that this synthetic residual signal is inappropriate for a 
perceptually high-quality decoder output signal. For these time/frequency tiles, it is possible 
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to encode and transmit the actual residual signal from the encoder, whUe for remaining 
time/frequency tUes, the decoder can use the synthetic residual signal. The scalabiUty options 
of this approach are of interest; the encoded residual signals can simply be removed from the 
bitstream. In that case, the decoder will default to its synthetic residual signal. This approach 
can be used for thiee-to-one, three-to-two, as well as two-to-one processing blocks. 

Bit-stream syntax 

As already noentioned on page 1, the basic techniques used for multi-channel 
parametric coding can be applied to code different multirchannel signal configurations. This 
is reflected in the proposed bit-stream syntax. Some example possibilities: 
encode an Lf, Rf, Ls, Rs (4 channels) multichannel signal into a mono or stereo-compatible 
signal, 

encode a L(f),C,R(f),LFE (3.1 channels) multirchannel signal into a mono or stereo- 
compatible signal or 

encode a Lf,C,Rf,Ls,Rs,LFE (5.1 channels) multir channel signal into a L(f), C, R(f) multir 

channel compatible signal. 

The bit-stream should be very flexible for decoding subsets of the multi- 
channel signal. Consider the situation where the original signal consisted of a 5.1 multi- 
channel signal encoded into a stereo signal using a similar structure as depicted in Figure 4. 
Furthermore, assume that reconstruction is required for a 3.1 setup, consisting of a Left 
(front). Right (front). Centre (front) and an LFE speaker. By only decoding using parameter 
sets 2 and 4 the 3.1 channel signal is already obtained. Hence, it is not necessary to fiirther 
decode the surround channels. 

In the bit- stream syntax the flexibility described above is obtained by 
explicitly defining the channel configuration of each encoder/decoder stqp in the 
mc_channel_config element In this element it is described which parameters belong to 
which (intermediate) input and (intermediate) output channels (see Table 16). 

Consider the very simple case where the input signal consists of the mono 
signal M and the output signal consists of tte mono signal M' and the signal LFE. If the 
frequency range of the LFE signal is only limited with respect to the bandwidth of the mono 
signal M, which is typicaUy the case, only for the lower part of the frequency range the 
signals M' and LFE are obtained. For the higher part of the frequency range the signal M' is 
simply obtained as M'=M. A similar situation might occur in a 1 to 3 decoding step. Consider 
the case where the input signal consists of mono signal M and the output signal consists of 
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the left signal L(f), the right signal R(f) and the LEE signal. For the frequency range for 
which the LFE has been included, parameters for aU three signals are included, which may 
e.g. consist of IID values between L and R and between L and LFE and coherence values 
between L and R, L and LFE and R and LFE (3 sets!). For the higher frequency range, the 
5 parameters will then only consist of rCD values between L and R and coherence values 

between L and R. For this frequency range the decoder operates in a 1 channel to 2 channels 
decoding mode. 

For the elements datalto3() two alternatives are given, the first one uses a 
representation containing the parameters of the covariance matrix, split into power ratios and 
10 coherence values. The second alternative uses an angular representation (e.g. as defined by 
Euler) of the rotation matrix R. 



Table 1 - Syntax of mc_data() 



Syntax 


Num. bits 


Mnemo 
nic 


mc_data(){ 

if (mc_chaimel_config=l) { 

mc channel configQ 

} 

for (s=0; s<nr_steps; s++) { 
switch(method[s]) { 
case 0: 

datalto20 
break; 

case 1: 

data2to3() 
break; 

case 2: 

datalto30 
break; 

case 3: 

datalto5() 
break; 

case 4: 

datalto51() 
break; 

} 

} 

} 


1 


uimsbf 

Note 1 


Note 1: nr_steps is defined in the mc_channel_config() element Hence, for the 
first instance of mc_data() mc__channel config should be set to %1. 



15 
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Table 2 - Syntax of mc_channel_config() 



Svntax 


Num. 
. bits 


Mnemo 
nic 


mc_channeUconfig() { 

nr_downmix_ch„coded 
nr„output_ch„coded 


1 

3 


Note 1 
Note 2 


iir__steps = 0; 

nr_dec_ch = m-^dowmnix^ch; 






while (irr_dec_ch < nr„output_cli) { 

for (ch=0; ch<metiiod_m_ch[iir„steps]; 

input„ch[nr_steps,ch] 

for (ch=0; ch<metliod_out_ch[nr_steps]; 

ch++) { 

output_ch[nr_steps5ch] 

} 

method[nr_steps] 


3 


Notes 


3 


Notes 


3 




nr_dec„ch += 
nr_increase_ch[niethod[nr_steps]] ; 






nr steps += 1 

} 






num__env_default 

1 


2 


Table 15 


Note 1: nr_downmix_ch = nr_downmix_ch_coded + 1 

Note 2: nr_output„ch = nr_downniix_ch + ixr_output_ch_coded; 

Note 3: See Table 16. LFE can never be input channel! 
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Table 3 - Syntax of datalto2 



Syntax 


Num. 
bits 


Mnemonic 


datalto2{ 

if (datalto2_header) { 
if (enable Jid) { 
iid.mode 

1 
/ 

if (enablejcc) { 
ice mode 

} 

if (enablejpdopd) { 
ipdopd mode 

} 

freq_res 


1 

1 

99 
• • 


uimsbf 


1 

• • 


uimsbf 


1 

?? 


uimsbf 


2 


Table 17 


if (var_framing) { 

for (e=0; e< nmii_env; e++) 

env_pos[e] 

) 

} 

for (cli=0; 
ch<method_out_ch[nr_steps]; ch+H-) { 

if 

(output_ch[nr_st:eps,ch]=6) { 

m-Jbands coded 

} 

} 


1 
2 


uimsbf 
uimsbf 


5 


uimsbf 


• • 


Note 1 


} 

if (enable_iid) { 

for (e=0; e<num_env; e-H-) { 
iid__dt[e] 

iid_data(e,nr_bands_coded) 

} 


1 


uimsbf 


} 

if (enable_icc) { 

for (e=0; e<num_env; e++) { 
icc_dt[e] 

icc_data(e,nr bands coded) 

} 

} 


1 


uimusbf 
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if (enable_ipdopd) { 

for (e=0; e<num_env; e-H-) { 
ipd_dt[e] 

ipd_data(e,nr_ipdopd_coded) 
opd_dt[e] 

opd data(e,nr ipdopd_coded) 
} 

} 


1 
1 


uimsbf 
uimsbf 


Note 1: In case one of the output channels is the LEE channel only a part of the 
signal is coded, denoted with m* bands coded. 
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Table 4 - Syntax of dta2to3 



Syntax 



data2to3{ 



e++){ 



ch++) { 
{ 



if (data2to3_header) { 

if (enable_beta) { 
betajuode 

} 

if (enable^gaimna) { 
gamma^mode 

} 

if (enable_opd) { 
opd.mode 

} 

freq_res 

if (var„framing) { 

for (e=0; e<num_env_lcr; 



} 



env_posJcr[e] 



} 



} else { 

num_env_lcr = num_env jBrame 
} 

for (ch=0; cli<method_out_ch[nr_steps]; 
if (output__ch[nr_steps,ch]=6) 
nr_bands_coded 

} 

} 



if (enable_beta) { 

for (e=0; e<num_env; e-n-) { 
beta_dt[e] 

beta_data(e,nr_bands„coded) 

} 

} 

if (enable^gamma) { 

for (e=0; e<num_env; e++) { 
gaiiima_dt[e] 

gamma_data(e,nr_bands_coded) 
} 

} 



Num 
. bits 



1 
1 

99 



99 



1 

99 



99 



Mnemo 
nic 



uinisbf 



uimsbf 



uimsbf 



uimsbf 
uimsbf 

uimsbf 



Note 1 



uimsbf 



uimsbf 
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if (enable_opd) { 

for (e=0; e<num„env; e-l-+) { 
opd_dt[e] 

opd data(e,iir_opd_coded) 

} 

} 


1 


uimsbf 


Note 1 : In case one of the output channels is the LFE channel only a part ot ttie 
signal is coded, denoted with nr bands coded. 
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Table 5 - Syntax of datalto3 (Alternative A) 



Syntax 



Num, 
bits 



Mnemonic 



datalto3{ 



if (datalto3_header) { 
if (enable Jid) { 
iidjmode 

} 

if (enablejcc) { 
icc_mode 

} 

if (enable jpdopd) { 
ipdopd_mode 

} 

freq__res 

if (var_framing) { 

for (e=0; e< num_env; e-H-) 



env_pos[e] 



} 



for (cli=0; 
ch<method_out_ch[nr_steps]; ch-hf) { 

if 

(output_ch[nr_steps,ch]==6) { 



nr_bands_coded_set2 



} 



} 



} 



if (enable_iid) { 

for (e=0; e<num_env; e++) { 
iidjt[e] 

iid_data(e,nr_bands_coded) 
iid_dt[e] 

iid_data(e,nr_bands„coded_set2) 
} 

} 

if (enable__icc) { 

for (e=0; e<num_env; e++) { 
icc_dt[e] 

icc_data(e,nr__bands_coded) 
icc_dt[e] 



1 
1 

?? 
1 

?? 
1 

99 



99 



uimsbf 



uimsbf 



uimsbf 



Table 17 

uimsbf 
uimsbf 

uimsbf 



Note 1 



uimsbf 
uimsbf 



uimsbf 
uimsbf 
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icc_data(e,nr_bands_coded_set2) 
icc_dt[e] 

icc„data(e,tir_bands_coded_set2) 
} 

} 

if (enable_ipdopd) { 

iir = 

inin(nr„bands_coded_set2,nr„ipdopd_coded); 

for (e=0; e<num_env; e++) { 
ipd_dt[e] 

ipd_„data(e,nr_ipdopd_coded) 

ipd_dt[e] 

ipd_data(e,iir) 

opd_dt[e] 

opd_data(e,nr) 



} 



uimsbf 



uimsbf 
uimsbf 
uimsbf 



Note 1 : In case one of the output channels is the LFE channel only a part of the 
signal is decoded to three channels, denoted with nr_bands_coded_set2. The rest 
of the signal is decoded to two channels, i.e., nr3ands_coded__set2 is set to 
nr_bands_coded. Note that for the bands decoded to three channels there exist two 
sets of nDs, three sets of ICCs, two sets of IPDs and one set of OPDs. 
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Table 6 - Syntax of datalto3 (Alternative B) 



Syntax 


Num. 
oils 


Mnemonic 


datalto3{ 

if (enable_angles) { 
angle mode 

} 

if (enable Jpdopd) { 
ipdopd mode 

} 

fineq_res 


1 
1 
• • 


uimsbf 


1 

?? 


uimsbf 


2 


Table 17 


if (var_framing) { 

for (e=0; e< num_env; e++) 

env_pos[e] 

} 

} 

for (ch=0; 
ch<metiiod_out_ch[nr__steps]; ch++) { 

if 

(output_ch[nr_steps,ch] — 6) { 

nr__uanas coaeu setz 

} 

} 


1 

2 


uimsbf 
uimsbf 


5 


uimsbf 


?? 


Note 2 


1 

if (enable_angles) { 

for (e=0; e<num_env; e++) { 
angle_dt[e] 


1 


uimsbf 


angle_data(e,nr_bands_coded) 
angle_dt[e] 


1 


uimsbf 


angle data(e,nr bands coded set2) 
} 






} 

if (enable„ipdopd) { 
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nr = 

n]in(nr j3ands_coded_set2,nr_ipdopd_coded^ ; 

for (e==0; e<num_env; e++) { 
ipd_dt[e] 

ipd_data(e,nr_ipdopd_coded) 

ipd_dt [e] 

ipd_data(e,nr) 

opd_dt[e] 

opd_data(e,nr) 



) 



uimsbf 
uimsbf 
uimsbf 



Note 1 : In case one of the output channels is the LFE channel only a part of the 
signal is decoded to three channels, denoted with nr_bands_coded„set2. The rest 
of the signal is decoded to two channels, i,e., nr_bands_coded_set2 is set to 
nrJ)ands_coded. Note that for the bands decoded to three channels there exist two 
sets of nDs, three sets of ICCs, two sets of IPDs and one set of OPDs. 



The dataltoSO and datalto51() elements are straightforward extensions of the 
datalto3() element with an increased amount of parameter sets. 



Table 7 - Syntax of iid_data() 



Syntax 


Num. bits 


Mnemonic 


iid data(e, nr_iid_par) { 

if (iid_dt[e]) { 

for (b=0 ; b<nr Jid_par; b-H-) { 
iid_par_dt[e][b] = 
sa huff dec(huff iid dt[iid quant],bs„codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr Jid_par; b++) { 
iid_par_df[e][b] = 
sa huff dec(huff iid df[iid quant], bs^codeword); 

} 

} 


99 99 


bslbf 
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Table 8 - Syntax of icc_data() 



Syntax 


Num. bits 


Mnemonic 


icc_data(e, nr_icc_par) { 

if (icc_dt[e]) { 

for (b=0 ; b<nr_icc„par; b+H-) { 
icc_par_dt[e][b] = 
sajiuff_dec(liuff ice dt,bs codeword); 

} 


99 99 


bslbf 




} 

else { 

for (b=:0 ; b<nr_icc4)ar; b+-l-) { 
icc_par_df[e][b] = 
sa_huff„dec(hxiff_icc df,bs codeword); 

} 

} 

} 


99 99 


bslbf 


Table 9 - Syntax of ipd_data() 


Syntax 


Num. bits 


Mnemonic 


ipd_data(e, nr ipd par) { 

if(ipd^dt[e]){ 

for (b=0 ; b<nr_ipdopd„par; b-H-) 

ipd_par_dt[el[b] = 
saj[iuff_dec(huff_ipd dt,bs codeword); 

} 






99 99 


bslbf 


} 

else { 

for Cb=0 ; b<nr_ipdopd par; b++) 

{ 

ipd__par_df[e][b] = 
sajiuff dec(huff ipd df,bs codeword); 

} 

} 

} 






99 99 


bslbf 
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Table 10 - Syntax of opd._data() 



Syntax 


xNum. UlLiS 


JLVJLilClllVJillL' 


opd data(e, nr opd_par) { 

if (opd_dt[e]) { 

for (b=0 ; b<nr_ipdopd_jpar; 

h++){ 

opd_par_dt[e][b] = 
sa huff decOiuff opd_dt,bs_codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr_ipdopd_j)ar; 

b-H-){ 

opd_par_df[e][b] = 
sa huff decChuff opd_df,bs_codeword); 

} 

} 

1 


99 99 


bslbf 



Table 11 - Syntax of beta„data() 



Syntax 


Num. bits 


Mnemo 
nic 


beta_data(e, nr_beta_par) { 

if (beta_dt[e]) { 

for (b=0 ; b<nr_beta„par; b-H-) { 
beta_par_dt[e][b] = 
sa huff decChuff beta dt,bs_codeword); 

} 


99 99 


bslbf 


} 

else { 

for (b=0 ; b<nr J3etajpar; b-i-+) { 
beta_par_df[e][b] = 
sa huff decChuff beta df,bs_codew)rd); 

} 

} 

1 


99 99 


bslbf 



5 
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Table 12 - Syntax of gamma_data() 



Syntax 


Num, bits 


Mnemonic 


gamxna_data(e, nr_gamma_par) { 
if (gaiiima_dt[e]) { 

for (b=0 ; b<nr_gamma par; 

b++){ 

gamina_,par_dt[e][b] = 
sa_huff_dec(liuff_garnma dt,bs codeword); 

} 


99 99 


bslbf 




} 

else { 

for (b=0 ; b<nr__gaimna par; 

b++){ 

garQma_par_df[e][b] = 
sa3ufL.dec(huff_gamma df,bs codeword); 

} 

} 

} 


99 99 


bslbf 




Table 13 - Syntax of angle_dataO 


Syntax 


Num- bits 


Mnemonic 


angle_data(e, nr_angle_par) { 

for (a=0 ; a<nr_angles ; a+-h) 
if (angle_dt[a,e]) { 
for (b=0 ; 

b<nr_angle_jpar; b+4-) { 


99 99 


Note 1 
bslbf 


angle_par_dt[a][e][b] = 
sajiuff„dec(huff_^anima dt,bs codeword); 

} 




} 

else { 

for (b=0 ; 

b<nr_angle_par; b++) { 


99 99 


bslbf 


angle_par_df[a][e][b] = 
sa_huff„dec(huff_ganmia df,bs codeword); 

} 

} 

} 

} 




Note 1: nr_angles follows from used method! 



5 
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Table 14: Dependencies of variable method 



method , 


aescripiioii . 




me.tTind out ch. 


*m:_increasej^ch. 


0 


1 to 2 


1 


2 


1 


1 


2 to 3 


2 


3 


1 


2 


1 to3 


1 


3 


2 


3 


1 to5 


0 (fixed) 


0 (fixed) 


4 


4 


1 to 5,1 


0 (fixed) 


0 (fixed) 


5 


5 


reserved 








6 


reserved 








7 


reserved 









Table 15: niim_env as a function of num„env_default 



ntim_env_default 


ntmilenW:/ ' - 


0 


0 


1 


1 


2 


2 


3 


4 



Table 16: Channel description 



input^ch / 
output ch; .i- 


description 


abbreviation 


0 


mono 


M 


1 


left (front) 


L(f) 


2 


right (front) 


R(f) 


3 


left smxoimd 


Ls 


4 


right surround 


Rs 


5 


centre (or front) 


C(F) 


6 


low frequency effects 


LFE 


7 


surround 


S 



Table 17: nr_bands and nr_bands_coded as a function of freq^res 



freq_res 


nr_bands 


nr_bands_coded (per default) 


0 


10 


10 


1 


20 


20 


2 


34 


34 


3 


reserved 
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Annex A Parametric multichannel audio coder with 2. 3. 4 and 5 channel nlavhank 
compatibility 

This invention is a naulti-channel extension of the basic principle described in 
WO03/090208-A1. The 5-channel content is downmixed into 2 channels combined with a 
small amount of parametric overhead which enables 5-channel reconstruction at the decoder 
side. Moreover, 2, 3, and 4-channel reproduction are also supported. 

The important features of the proposed coder are: 
transmission of two audio channels (which can be encoded using an arbitrary stereo audio 
codec) and are preferably obtained using principal component analysis on the left- front and 
left-rear pair on the one hand, and using a separate principal component analysis on the right- 
front and right-rear signal pair; 

- transmission of parametric overhead, which includes: 

- inter-channel level differences between left- front and left-rear channels; 

- inter-channel level differences between right- front and right-rear channels; 

- inter-channel coherence values between left- front and left-rear channels; 

- inter-channel coherence values between right-front and right-rear channels; 

- the power ratio between the centre channel and the sum of the powers of left-front, 
left-rear, right-front and right-rear channels 

Additionally, the inter-channel phase differences and overall phase differences 
between left-front and left-rear on the one hand, and right- front and right-rear on the other 
hand, may also be included in the parametric bit stream. 

The parameters described above are typically analyzed as a function of time 
and frequency (i.e., for a set of time/frequency tiles). 

Encoder 

Assume a five-channel audio signal Z/[n], lr[nl rjinl r^Cn], which describe the discrete 
time-domain waveforms for the left- front, left-rear, right-front, right-rear and centre signals, 
respectively. These five signals are segmented using a common segmentation, preferably 
using overlapping analysis windows. Subsequently, each segment is converted to the 
frequency domain using a complex transform (e.g., FFT). However, complex filter-bank 
structures may also be appropriate to obtain time/frequency tiles. This process results in 
segmented, sub -band representations of the input signals (which will be denoted by Lf[k], 
Lrlk], Rjik], Rr[k], and C[k] with k denoting frequency index). 
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As a first step, the relevant parameters between left- front and left-rear are 
estimated. These parameters include the level difference QIDl), the (average) phase 
difference {IPDl) and the coherence (JCCd'- 



m. =10 log 10 



IPD^=Z 



k 



V * * / 



This process is repeated for the right-front and right-rear channels, resulting in 
//D/f,/PZ>R, and /CC«. 

10 The second step consists of a principal component analysis (PCA) of the two 

signals left-front (I/) and left-rear {Lr). To be more specific, these two input signals are 
rotated in order to obtain a dominant {YVk}) and a residual signal 02W), using a rotation 
angle a which maximizes the energy of the dominant signal: 

cosa sinaT LfVcl.ex^iji-OPDjy) 
-sin a cosajLI.^[A:].exp( j(-OPDi + ^^^i)). 



15 



Qm 



20 



Here, the angle OPDl denotes an overall phase rotation angle, while IPDl 
ensures maximvim phase- alignment of the two signals Z/andLr. The rotation angle a can be 
derived from the IIDl and ICCl parameters following 



a = — arctan 
2 



'' 2gICCj_ 



with 



g = lQ"Ot.i7a 

25 
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The signal Qlk] is subsequently discarded, and signal Ylk] is scaled by a scalar 
J3 to obtain L[k] in such a way that L[k] has the same power as the power of Qlkl plus the 
power of Ylk] (i.e., the signal Qlk] is discarded while the loss in signal power is compensated 
for by scaling of Y[kJ). It can be shown that the required scale factor is equal to: 




This process is also repeated for the right- j&ont and right-rear signal pair, 
resulting in signal R[k], 

The last step entails mixing the centre signal C[fc] in both L[yt] and R[k], 
resulting in the stereo output signal pair LqutUc], Rourlk] : 







'L[k]+Ecm' 






_R[k]+eC[k]_ 



Here, e denotes a weight that determines the strength of C\k] in the dowmnix 
(typically 0.707). A parameter IIDc that describes the power of C with respect to the power 
of Z, and If is extracted: 



TZDc =101oglO 



The process as described above is repeated for each time/frequency tile. 
Subsequentiy, the signals Lourlk^ and RouTik] are transformed to the time domain and 
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combined with previous segments using overlap- add, resulting in the output signals /oi/rM 
and rouiiri]. A schenoatic overview of the encoder is sliown in Fig. Al. 
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Fig. Al. Schematic overview of the encoder 



Decoder for stereo playback 

For stereo playback, the transmitted signals lourln] and roudn] are reproduced 
over the two playback channels, without further processing. 



Decoder for d-channel playback 

For 3-channel playback, the two received channels lourln] and rourin] are 
segmented and transformed to the frequency domain. Subsequently, the output signals LUc], 
Rik} and C[fc] are obtained as follows: 



'my 













^ LOGOUT ^RC^OUT_ 



with 
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ol='£Uk}L\k] 

k 



^ 2 + 10"^^=''° 



Decoder for 5-channel playback 

For five- channel playback, first the 3-channel playback reconstruction as 
described above is performed, resulting in signals Lik], Rlk] and C[A:]. The next step entails 
sphtting L[k\ in L^k] and Lrikl, and sphtting Rik} in R^] and This spHtting process is 
performed using the inverse PCA rotation as used in the encoder. The dominant Y\]c\ and 
residual Qiki signal, which are required for the inverse PCA rotation, are obtained as follows: 



'Ym' 




Z.[^]cosY 


am. 




H[kmk\^y 



Here, Hlkl denotes an all-pass decorrelation filter to obtain a decorrelated 
version of L[fc], The angle ? is given by; 



r 

y = arctan 



V 



Subsequently, the signals L^k] and Lr[k] are obtained using inverse PCA: 
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cos a 
sina 



• sin a IfexpC jOPD^) 



0 



cos a 



0 



expijOPD^ -JPD^)jlQ[k\ 



This process is then repeated for the right channel. 
Decoder for 4-channel playback 

The decoder for 4-channel playback can be simply obtained by first decoding 5 channels (//, 
Ir, c, rfi and r^), followed by mixing of the centre channel (c) in front left and front ri^t: 



If.playback =lf+ 0.707 C 
10 r^playback = rf+ 0.707 c 

The factor 0.707 ensures that the total power of the centre channel is constant, 
independent of playback through the single centre speaker or as phantom source created by 
left front and right front. The surround channels remain unchanged (i.e., the signals are the 
15 same as for 5-channel playback). 
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Annex B Coding the l ow frequency effects (LFE) channel in parametric multichannel audio 
codins 

The multi-channel audio coder described in on the previous pages, does not 
support the coding of a low frequency effects (LFE) channel, which is incorporated in a 
standard commonly known as 5,1. 

The proposed coder incorporates the LFE channel. To this end additional 
parameters need to be sent to the decoder. Also, the same time, the 2- channel down- mix is 
modified to enable the decoding of the LFE channel. This is done in such a way that the good 
quaUty of the stereo image of the 2- channel down- mix is preserved. 

The parameters are typically analysed as a function of time and frequency (i.e., 
for a set of time/frequency tiles). The bandwidth of the LFE is typically limited to 
approximately 120 Hz, The proposed solution allows for a variable bandwidth of the LFE 
channel. 

A multichannel coder is described in Annex A. A schematic overview of its 
encoder is shown in Fig. 1. in the foHowing, only the changes required for incorporating the 
LFE channel are explaiaed. 

Encoder 

In Fig. B2, a schematic overview of the encoder iacorporating the subwoofer 
channel is shown. 

Assume that c[n] and Ifeln] describe the discrete time-domain waveforms for 
the centre and the LFE signal respectively. The signal Cs and the parameters from the block 
'parameter analysis' are obtained from signals c and Ife in the same way in which the signal L 
and the parameters from the block 'parameter analysis' are obtained from signals ^and Zr, as 
described in Annex A. There is however a difference. Because of the low- frequency 
behaviour of the signal Ife, this is done only for a limited number of frequency subbands. To 
this end, a parameter describing the number of frequency subbands occupied by the signal Ife 
is incorporated in parameter set 4. In the remaining higher subbands, only the signal C is 
transmitted. The other parameters included in parameter set 4 are the level difference (HO) 
and possibly tiie phase difference (TPD) between the centre and the LFE channel. If the EPD 
is sent, also the OPD parameter needs to be transmitted. 
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Fig. Bl. Schematic overview of the encoder described in Annex A 



Decoder for 5 J -channel playback 

For 5.1-chamiel playback, the signals C[fc] 3ndLFE[k} are obtained from the 
reconstructed signal CsUc] similar to the way in which signals L/Tfc] and Lr[fe] are obtained 
from LUc] as described in Annex A. The values of the not transmitted parameter ICC are set 
to 1. If the IPD and OPD are not transmitted, they are set to 0. 
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Fig. B2. Schematic overview of the encoder including an LEE channel 
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Annex C Reducing the inter-channel interference 

The multi-channel audio coder described in Annexes A and B aims at: 
Approximating the multi- channel audio with two channels and parametric overhead. This is 
done for low bit rate reasons, 
5 Backwards compatibility with 2, 3, 4 and 5-channel reproduction systems. To 

this end a good quality of the stereo image of the 2 transnoitted channels is required. 

The multichannel audio coder described in Annexes A and B exhibits a 
significant amount of inter- channel interference, mainly due to the demand for a good stereo 
image of the 2-channel down-mix. The current invention significantly reduces the inter- 
10 channel interference, hi this way, the quality of the reproduced multt-chaimel audio is 
significantly enhanced. 

The coder proposed in this Annex C represents N input channels by 2 down- 
mix channels and parametric overhead. In order to get the best possible reconstraction, in the 
sense of least- square-errors, of the N input chemnels at the decoder using only 2 channels, 
15 Principal Component Analysis (PC A) should be used. Perfect reconstruction of the N input 
channels at the decoder is only possible if all N channels from PCA are used. Drawback of 
PCA is the fact that no control can be exerted over the 2 down- mix channels. This means that 
the abovementioned requirement regarding the good quality of the stereo image is not met 
when employing PCA. 

20 As in the case of PCA, also in the case of employing 2 down-mix channels 

that do have a good quality of the stereo image, a perfect reconstruction at the decoder is only 
possible when these 2 down-mix channels are extended with an appropriate set ofN-2 
channels. As opposed to PCA, whose N channels are orthogonal so that the N-l discarded 
channels cannot be predicted using the 2 down- mix channels, now the N-2 channels can - to 

25 some extent - be predicted fi-om the 2 down- mix channels. The proposed coder exploits this 
predictability at the decoder. In order to do so, parameters need to be sent to the decoder. 
These parameters are typically analysed as a function of time and frequency (i.e., for a set of 
time/frequency tiles). 

As compared to Annex A or Annex B the following changes are made in the 

30 proposed coder: 

- at the encoder: the parameters required at the decoder have to be computed. 

- in the bit-stream: the parameters required at the decoder have to be included. 

- at the decoder: a parameter dependent up- mix from2 to 5 channels is performed (in 
Annex A, a fixed up- mix is performed). 
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Encoder 

Assume an iV-channel audio signal, where zi\n\, ^/vM describe the 

discrete time- domain waveforms of the N channels. These N signals are segmented using a 
5 conamon segmentation, preferably using overlapping analysis windows. Subsequently, each 
segment is converted to the frequency domain using a transform (e.g., EFT), However, filter- 
bank structures may also be appropriate to obtain time/frequency tiles. This process results in 
segmented, sub-band representations of the input signals, which will be denoted by Zi[fc], 
ZaM, ... , Za^L^] with k denoting frequency index. 
10 From these N channels, 2 down- mix channels are created, being LqIJc] and 

i?b[/t], which are also segmented, sub -band representations. Each down- mix channel is a 
linear combination of the N input signals: 

i=l 

The parameters ai and Pf can be set on the basis of a certain criterion. In ID 695741, this 
15 criterion is the good stereo image of the steieo signal consisting of Lo[k] and Roik]. 

Perfect reconstmction of the N input channels at the decoder is only possible 
when these 2 down- mix channels are extended with an appropriate set of N-2 channels. As 
opposed to PCA, whose N channels are orthogonal so that the N-l discarded channels cannot 
be predicted using the 2 most relevant channels, now the N-2 channels can - to some extent - 
20 be predicted from the 2 down-noix channels. 

If the N'2 discarded channels are denoted by Co,z[fc], then these channels are 
predicted from the two down- mix channels by: 

For choosing parameters C^^ and Q various optimisation criteria are possible. We choose as 
25 an optimisation criterion the minimal EucUdian norm of the difference of signal Coj[k] and its 
estimation C^jlk] , Parameters C^j and Q need to be sent to the decoder. 

It can be shown that the parameters Q and C2 , are related to the parameters 
that are obtained when minimisiag the Euclidian norm of the difference of the original input 
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channel Zf[fc] and its estimation at the decoder Z,[fe] . A coder that uses these latter 

parameters is further described. 

The square of the Euclidian norm of the difference of the original input 

channel Zlk] and its estimation at the decoder Z,.[fe] can be written as: 
5 Jl\z,[k]-Z,ikf, 

k 

with 

Minimisation of J^|z. [ifc] -Z,.[fc]p leads to the foUowing expressions: 

k 

10 

<Lo[fc].z,.[fc]>* \\Ro[kf-<Rom,z,[k-\ ><L^m,Rom > 
^'-^ WmWoVkf -\<Lm,Roik-\>t ' 

<Rom,zm> \Loik%-<L^m,z,m><Lom,Rom> 

with 

k 

< Am, m >= Y,A[k-]B*[ki 

k 

15 

For the coefficients Ci,2s and C2.a, the foUowing relations can be derived: 

N 

J^a,Q^, =1, 
-£aA.z, =0. 



20 
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Having N channels, with 2 parameters per channel (Ci,2s and C2,zi for the z-th 

channel), but at the same time 4 equations describing relations between these parameters, IN- 

4 parameters need to be sent to the decoder. 

The process described above is repeated for each time/frequency tile. 

Subsequently, the signals LoUc] and Folk] are transformed to the time domaia and combined 

with previous segments using overlap-add, resulting in the output signals /oM and ro[n]. 

Summarising, the encoder sends 2 down- mix channels, foM and ro[/2], and for 

each time/frequency tile 2A^-4 parameters, tiiat describe how to retrieve the input channels 

from the 2 down- mix channels, to the decoder. 

Decoder 

At the decoder side, for each time/ftequency tile, first the coefficients Ci,2s and 
C2.zf are computed for all N channels, using the 2A^-4 coefficients that are transmitted and the 
4 equations describing relations between the coefficients. Then each mput channel Ztik] is 
approximated by Z. [A:], with 

where Lo[k] and Bolk] are the received 2 down-nodx channels. 

Incorporation of the coder in the multi-channel coder described in Annex A or 

Annex B 

A schematic overyiew of the encoder of the multi-channel coder described in 
Annex A and Annex B is given in Fig. Al and Fig. B2 respectively. The coder described in 
this Annex C, can be used to replace the block called 'Mixing And Parameter extraction'', 
that has as inputs the channels L, R and Cs and as outputs the channels Lo and i?b and 
parameter set 3. In order to get a good stereo image of the 2 down- mix channels Lo and Rq, 
they are chosen as: 

Because of the three input channels (hence N = 3), only 2Ar-4, or 2 parameters 
need to be transmitted to the decoder. It is advantageous to transmit 2 parameters that have 
the same range (e.g. Ci,/. and Ca,^), so that the same quantisation can be applied to them. 

At the decoder side, for 3 or more channel playback, first all 6 parameters (Ci,/. 
, C2,L , Ci^R , Ci^cs and C2,cs) are computed using the 2 transmitted parameters and the 
relations between the 6 parameters. For example, if Ci^c and C2,r are transmitted, then it 
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follows that C2.L= C2./5-1, Ciji = Ci,L-l, Ci,cs=l - Ci.z.aiid C2,cs= 1 - C2,R. The output 
signals Llk], ^[ife]and Cs[A;] are obtained as follows: 

C^cLom + C2,cRo[kl 



'mi ' 













Playback of 4- or 5-channels is explained in Annex A. 



PHNL040388EPQ 



42 02.04.2004 

Annex D Improved stereo codine 

Traditional coding schemes, like e.g. MPEG-1 Layer in (mp3, [1]) employ 
stereo coding tools to improve the coding efficiency. One of these coding tools is known as 
Mid/Side (M/S) stereo coding or Sum/Difference stereo coding [2]. Using M/S coding a 
stereo signal consisting of a left signal l[n\ and a right signal r[ri\ is coded as a sum signal 
m[n] and a difference signal s[n\^: 



m[n] = r[n] + /[«] 



For (almost) identical signals /[n] and r[n] this gives a large coding gain as 
the corresponding difference signal is close to being zero, whereas the sum signal 
contains practically all the signal energy. Hence, in this situation the bit rate required for 
coding the sum and difference signals is close to the bit rate required for coding only a single 
channel. 

Alternatively the mid- side coding process can be described by means of a 
rotation matrix: 





^71 ^ 






COS 




sin 




— sin 






i 




cos 





It is obvious that the left and right signals have been rotated over an angle of 
TZ/A. This is illustrated in Figure Dl. The sum signal can be interpreted as a projection of 
tiie left and right samples onto the line l=r, whereas tiie difference signal can be interpreted ; 
a projection of the left and right samples onto the line l=-r. 



^ Usually the sum and difference are calculated as /M[n] = c (/[n]+ r[n])and 

5[«] = c ■ {l[n'\ - r[n]) . For explanatory reasons, the constant c has been discaixied and the 

sign of has been inverted. 
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Figure Dl: Rotation of tiie left and right signals Z[n] and r[n] over an angle of jr /4 . 

In order to obtain a minimum signal power in the residual signal (i.e., 
maximum coding gain) for a wide class of input signals, the rotation angle needs to be 
variable. An improved signal mapping, appHed in a sub-band coding system using a variable 
rotation angle, is outlined in [3, 4]. The following signal mapping is applied: 



m'iti] 



= c\ 



cos(a) sin(a)Y/[n]^ 
-sin (a) cos(a) I r[n] 



where m'[n] and s'[n} represent the dominant and the residual signal respectively and the 
angle a is chosen to minimize the power of the signal s'[ri] . Due to the unitary rotation 
operation, the power of m'[n] is then maximized. This process is illustrated in Figure D2. 
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Using regular M/S coding (as represented by m and s in Figure Dl), the signal 
5 illustrated in Figure D2 would still result in a residual signal with considerable energy. As 

such, the coding gain obtained by M/S coding would be marginal. However, using the 

variable rotation angle a as iUustrated above, a very small residual signal can be obtained. 

Obviously, this rotation technique works particularly weU when the left signal is 

approximately a scaled version of the right signal. 
0 Both the M/S coding technique (i.e., rotation with a fixed angle) as well as the 

variable rotation technique described above are typicaUy not applied to the broadband signal, 

but rather to signals (or frequency domain representations ) representing only a smaller part of 

the full bandwidth of the audio signal, as e.g. described in [3, 4]. 



15 



Although the rotation technique as described in [3,4] eliminates much of the 
disadvantages of M/S coding it is still sub optimal for signals having a strong phase or time 
offset This is illustrated in Figure D3. 
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Figure D3: Rotation of left and right signals along an angle a . 

The ovatlike structure indicates a time or phase delay between 1 and r. 
Because the rotation implies a real- valued projection, the residual signal wiU still contain a 
significant amount of energy (although usually less than using regular M/S coding). •■ 

In order to further reduce the residual signal energy it is proposed to extend 
the current signal rotation by employing complex- valued phase rotations to the left and right 
signal components. From this point, it is assumed that the left and ri^t signals are 
represented by their complex- valued frequency domain representations /[fc] and r[fc] . One 
method to obtain such signal representation is as follows. First the left and right time domain 
signal segments are windowed: 

l,[n] = Iln + qm-h{ni 

where q represents the frame index (q = 0^,2,.... ), H represents the hop-size or update-size 
and n = 0...L-1 where L equals the length of window h[n] . These windowed segments are 
then transformed to the frequency domain by means of a Discrete Fourier Transform (DFT): 
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N-l 



/OT = i;/,[«]-exp 



»=0 



2Kkn 

N 
2%kn 

N 



where N represents the DFT length (N'^L). Because of symmetry in the DFT only the first 
N/2 + 1 points are preserved. Furthermore, in order to obtain energy preservation, the first 
DFT points are scaled: 



/[0]s/[0]/2 
r[0] = r[0]/2 



The signal model is extended in the following way: 



^cos(a) sin(a) Ye-''^' 0 Yllk] 
-sin(a) cos(a) I 0 e^^'^'^^-^ liik] 



As can be observed from the equation above, the real- valued rotation (using a 
variable rotation angle a is extended with a complex- valued phase modification matrix. The 
angle (p^ is used to minimize the energy of the residual signal by (phase-) rotating the right 
signal. The common angle (Pi can be used to maximize the continuation of the signal over 
frame boxmdaries. 

After signal mapping/modification the dominant and residual tune domain 
signals m[«] and s[n] are obtained by first applying the inverse DFT on the frequency 
domain representations m[ik] and sik]: 



N-l 



N-l / 



f . 2nkn 
J- 



/i=0 



N J 



where the dominant and residual frequency domain representations and s[k] have been 

zero-padded to length N . The time domain signals are then obtained by means of overlap- 
add: 
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m[n+ qH] = m[n + qH} + 29tiw,[n] • Mn]} 
s[n + qH} = sin + qm + 2m{5' , [n] • 

Alternatively, complex-modulated filter banks could be employed to obtain a 
complex- valued frequency domain representation. 

As an example, the following synthetic signal is mapped by tiie three different 

methods described above: 

/[n] = 0.5 cos(0.32n -1- 0.4) -I- 0.05 • z^M + 0.06^2 [n] 
r[n] = 0.25 cos(0.32n +1.8)-l- 0.03- zjn] + 0.05z3[n] ' 

where ZiM, ^st'^] ^ independent white noise sequences with unit variance. Part 

of the signals /[«] and r[n] are shown in Figure D4. 
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Figure D4: Left and right signals l[n] and r[n] respectively 

Figure D5, Figure D6 and Figure D7 show the results for the M/S transform, 
the signal rotation over an (optimal) angle a and the rotation over botii an (optimal) angle a 
and phase rotation as proposed in this ID respectively. In this particular example the angles 
a , (Pi and (p2 are fixed. In a typical embodiment, these parameters are both time and 
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frequency dependent In each figure, the top panel represents the dominant signal (m[n]); the 
bottom panel shows the residual signal (s[n]). 

The M/S mapping, as shown in Figure D5, clearly does not increase the coding 
efficiency for this particular situation. As a matter of fact, the residual signal energy, i.e., the 
energy of the signal s[n] , is higher than the energy of the input signal r[n]. 
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Figure D5: Resulting signals after M/S mapping 

The mapping by means of applying the (optimal) signal rotation a as 
iUustrated in Figure D6 does also not help for this particular signal. Only a negUgible Energy 



reduction is obtained. 
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Figure D6: Resulting signals after rotation 

Finally, the results of the extension of the mapping as proposed in this ID are 
5 shown in Figure D7, Here a clear reduction of residual signal energy is observed. 
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Figure D7: Resulting signals after signal and phase rotation 

A block diagram of an encoder according to the invention is given in Figure 
10 D8. The left and right frequency domain representations I and r are phase rotated to obtain 
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a maximum coherence (angle (pj ) and 

an optimal signal continuation over time (angle (pi ), 

Consequently, the phase-rotated left and right signals are rotated over an angle 
a for maximal reduction of the residual signal error as described above. The parameters a , 
9i and 92 quantized and coded into the bit stream. The dominant signal m and the 
residual signal s can be coded by two independent conventional mono audio coders (or of 
course one dual mono encoder). Additionally, certain parts of the time- frequency plane of the 
signal s, not perceptually contributing to the final output signal, can be discarded in the time- 
frequency (t/f) selector unit. The overall bit stream is formed by merging the bit sti^am 
corresponding to the dominant signal m, the residual signal s and the parameter bit stream. 
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Figure D8: Block diagram of proposed encoder 
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Figure D9 shows a block diagram of the decoder corresponding to the encoder 
of Figme D8. First the bit stream is de- multiplexed into separate bit streams for tixe dominant 
signal, the residual signal and tiie parameters. The bit streams for the dominant signal and the 
residual signal are decoded resulting in the signals m' and s\ Then the inverse rotation (-a ) 
is appUed to obtain preliminary left and right signal representations. FinaUy the left and right 
signals, F and r' respectively, are obtained by applying the inverse phase rotations (-<Pi and 
-92). 
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Figure D9: Block diagram of proposed decoder 
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The invention can also be advantageously applied in combination with a 
parametric stereo coding system [5]. A general block diagram of a parametric stereo coding 
scheme is given in Figure DIO. It is fairly similar to the block diagram of the proposed 
encoder except for the fact that: 

- no residual signal is being transmitted and 

- the angle a is not transmitted, but instead an ED value and a coherence value ? are 
transmitted. 

The nD value represents the Inter-channel Intensity Difference, denoting the 
(frequency and time variant) intensity differences between the left and right input channels. 
The coherence value denotes the coherence, i.e., the similarity, between the left and right 
input channels after phase synchronization. The angle a can be derived from the IID and 
coherence value. 
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Figure DIO: Block diagram of parametric stereo encoder 

A corresponding decoder block diagram is shown in Figure DIO. It 
corresponds to the block diagram of Figure D9 except for: 

- the residual signal is now estimated based on the dominant signal m' by means of a 
de-correlalion process D and 

- the amount of coherence between the left and right output signals is determined by a 
scaling operation. 

The scaling operation basically describes the ratio between the dominant 
signal and the residual signal. 
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Figure Dll: Block diagram of parametric stereo decoder 
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Figure D12 shows a block diagram of the parametric stereo encoder enhanced 
with residual coding. With respect to Figure DIO, the only difference resides in the coding of 
(part of) the residual signal s. Which part of the residual signal is coded by Coder 2, is 
determined by the time- frequency (t/f) selector unit 
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Figure D12: Block diagram of enhanced parametric stereo encoder 

A block diagram of the parametric stereo decoder enhanced with residual 
coding is shown in Figure D13. The bit stream is first de- multiplexed mto separate streams 
for the dominant signal, the residual signal and the stereo parameters. The dominant and 
residual signals are then decoded by Decoder 1 and Decoder 2, respectively. Those 
spectral/temporal parts of the residual signal which have been coded are signaUed either: 
imphcitly, by detecting "empty" areas in the time- frequency plane or 
explicitly, by means of bits in the parameter bit stream. 

This information is appKed in the de-correlation unit D and the Combine unit 
to fiU the empty time- frequency areas in the decoded residual signal with a synthetic residual 
signal. This synthetic residual signal is generated by using the decoded dominant signal m' 
and the de-correlation unit D, For aU other time- frequency areas, tiie (transmitted) residual 
signal is appUed to construct flie signal s\ Note that for these areas, no scaUng is appUed. 
Hence, for these areas it can be advantageous to transmit the angle a ia the encoder instead 
of the no and coherence values as the bit rate required for the single parameter a is smaller 
than the bit rate required for the IK) and coherence parameters. However, transmission of a 
instead of ED and coherence values makes the system non-backwards compatible to the 
regular PS system. The subsequent stages of the decoder operate in the same fashion as the 
conventional parametric stereo decoder. 



PHNL040388EPQ 



53 



02.04.2004 



E 

CO 
J3 



X 

E 

E 
cd 

CQ 



Decoderl 



rrr 



► Decoders 



Combine 



Scaling 



Signal 
rotations 



Phase 
rotations 



9i' 



Figure D13: Block diagram of enhanced parametric stereo decoder 

The criteria for deciding which time- frequency areas of the residual signal 
need to be coded are perceptually motivated If (a time- frequency area of) the residual signal 
5 s does not contribute to the audio quality of the final decoder output signal, or if (a time- 
frequency area of) the de-correlated signal forms a perceptually valid representation of the 
(corresponding time- frequency area of the) residual signal, it is not necessary to code the 
residual signal. 

By coding different time-frequency aspects of the residual signal in the 
10 encoder and by multiplexing the corresponding data into a scalable bit stream, the enhanced 
parametric stereo codec can be extended to a bit-rate scalable codec. In a scalable system 
where the layers in the bit stream are dependent, the coded data corresponding to the 
perceptually most relevant time- frequency aspects should be placed in the base layer, and the 
less important data moved to refinement, or enhancement, layers. In this case the base layer 
15 would consist of the dominant bit stream, a first enhancement layer would consist of the 

stereo parameters and a second enhancement layer would consist of the residual bit stream. 

When layers are removed from the scalable bit stream, and information 
regarding the residual signal is thus lost, the enhanced parametric stereo decoder can combine 
the decoded residual signal reconstructed from the data in the remaining layers with the 
20 synthetic residual signal in the manner described above to form a meaningful residual signal. 
Furthermore, if a decoder is not equipped with a second waveform decoder (for the residual 
signal), e.g. due to complexity restrictions, the signal could still be decoded, although this 
would result in a lower quahty level. 

Further bit-rate reductions can be obtained by discarding the values for (p^ and 

25 q>2in the bit stream. In that case, the decoder reconstmcts the output signals V and r' using a 
phase rotation of zero. This method effectively exploits the lack of sensitivity of the human 
auditory system to high-frequency (inter- aural) phase information. 
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Annex E Post-processins of the individual multi-channel signals in the s tereo down-mix 

Many of the multi- channel audio coders as described herein before generate a 
2-channel down-mix to be compatible with 2-channel reproduction systems. When this 
down-mix is cxreated the spatial impression of the original multi-channel mix is lost. The 
current invention makes improvement of the spatial image of the down- mix possible after 
creation, based upon the parameters as determined in the multt-channel encoder. Also other 
post-processing techniques on the individual multi-channel contributions are made possible. 

The proposed method makes a reconstruction possible of the multirchannel 
mix that is not affected by the post-processing. Also post-processing in the decoder is 
possible for stereo playback as a user- selectable, without the necessity to determine the multi- 
channel signal first. 

Without post-processing the down- mix is comparable with the standard ITU 
down-mix. The proposed method however improves the down- mix significantly. This is a 
very important issue, because it is very probabb that the quality of the down- mix will be one 
of the selection criteria within MPEG. 

The proposed method is able to determine the contribution in flie down- mix of 
tiie original channels in the multi- channel mix Avith the help of the determined parameters in 
the encoder. In this way post-processing can be applied to specific channels of the multi- 
channel mix (for example: stereo- widening of the rear channels), whilst the other channels 
are not affected. The post-processing does not affect the final multtchannel reconstruction. It 
can also be apphed for an improved stereo playback without the necessity to determine the 
multi-channel mix first. 

This method differs from existing post-processing techniques in that it uses the 
knowledge of the original multirchannel mix (the determined parameters). 

Encoder 

Assume an N-channel audio signal, where Zi[n], ZjM' .z^W 'inscribe 

the discrete time-domain waveforms of the N channels. These N signals are segmented using 
a common segmentation, preferably usmg overlapping analysis windows. Subsequentiy, each 
segment is converted to the fi:«quency domain using a complex transform (e.g., EFT). 
However, complex filter-bank stinictures may also be appropriate to obtain time/fixsquency 
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tiles. This process results in segmented, subband representations of the input signals (which 

will be denoted by ZJA;], Z^W Z^[k] with denoting the frequency index). 

From these N channels, 2 down- mix channels are created, being L^ik] and 
Ro[k] . Each down-mix channel is a linear combination of the N input signals: 

N 



The parameters a . and P . are chosen such that the stereo signal consisting of 
L^lk] and RoLk] has a good stereo image. 

In the decoder the N input channels are reconstructed as foUows: 

= Q^zLom + c^,zRo Ik] , 

where Z,[k] is an estimate of Z^lk] . The parameters C^ ^, and Q are determined in the 
encoder and transmitted to the decoder. 




Figure El: The positioning of the post-processing and inverse post-processing block in the 
multi-channel coder. 
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Figure E2: Basic scheme for post-processiQg of the stereo down- mix. 

On the resulting stereo signal post-processing can be applied in a way that it ; 
5 mainly affects the contribution of Z^lk] in the stereo mix. In Figure El tiie position of fliis 

block in the codec is shown. 

Figure E2 shows how this post-processing block will look. The parameter W; 

determines the amount of post-processing of Lolk] and w^of i?oW.When is equal to 0, 

LoVc} is unaffected, and when W; is equal to 1, Lolkl is maximahy affected. The same 

10 holds for with respect to . 

The following equations hold for the post-processing parameters and : 



15 



The blocks H^m Figure El are filters, which can be various types of 

filters, for example stereo widening filters (as shown in the end of this section). The resulting 
outputs are: 
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= H 


'Lo' 









w^H^ 1 - Vl^^ + ^^^^4 J * 

When the filters are chosen properly, then the matrix H is 

invertible, when the filters are known in the decoder, because the parameters and can 
10 be calculated from the transmitted parameters. So the original stereo signal will be available 
again which is necessary for decoding of the multi- channel mix. 

Another possibility is to transmit the original stereo signal and apply the post- 
processing in the decoder to make improved stereo playback possible without the necessity to 
determine the multhchannel mix first 

15 

Incorporation of the coder in a multi-channel coder described in Annex A, BorC 

On these pages an encoder is described that codes 5-channel audio. The 
following equations are applied: 

in which Cs[k] is the mono signal that results after applying OCS between the LEE- 
(subwoofer) and center-channel. ForL[A:] and /f[jt] the following equations hold: 

25 

L[k] = C,(cos(a;)L^ +sin( a, )e^''''^L, ) 
Rik-\ = C,(cos(a,)i?^ + sm(a,>^^''''^i?J, 



with: 



1- 



where Lj. is the left/front, the left/surround, if^ the right/front and i?^ the right/surround 
channel. The parameters IPD^ and IPD^ (inter-channel phase differences) and and 
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(complex scaling parameters) are parameters that are determined in the OCS encoder. The 
angles in these equations can be calculated from the inter-channel intensity differences (ED) 
and the normalized cross-correlation (IC): 



a, = 0.5 arctanC -^^7^^7133^) 
2/ai0''^''^°. 



=0.5aictan(- 



In the decoder the following reconstraction is done: 
L[fc] = pLo[/:] + (l-T)l?oW 

cm = (1- ^)Lom + (1 -Y)^o m. 

wherc £W is an estimate of L[fc], an estimate of Mfc] and C[fc] an estimate of 
Cs[A;] . The parameters P and y are determined in the encoder and transmitted to the 
decoder. 

Knowing aU this, the functions that are used for the post-processing are: 

=/i(0C/)/2(P) 

here /i can be any function. For example to apply stereo widening on the rear 

channels: 

/i(a)=/3(«)=sin(a) 

'sin(0.57cp) // 0<|3<1 
/,(P) = /,(p)= 1 if |3>1 

0 if |3<0. 



For the filter functions the following functions are then chosen (in the z-domain): 
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(z) = H4 (z) = 0.8(1 .0 + 0.2z-^ + 0.2z-' ) 
H^iz) = ^^3(2) =0.8(-1.0z-^ -0.2^-^). 

This invention can be applied in any multi- channel audio-coder that creates a 
compatible stereo down- mix. The invention can be appUed in two ways: 
It can be apphed before transmission to provide an improved down- mix for decoders that can 
only decode the stereo audio. 

It can also be applied in decoders that can handle the whole bit-stream as a 
user- selectable option to make improved stereo reproduction possible. 
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CLAIMS: 



An audio encoder as described hereinbefore. 

An audio decoder as described hereinbefore. 

An audio encoding method as described hereinbefore. 

An audio decoding method as described hereinbefore. 

An audio signal as produced by the method of Claim 3. 

A storage medium having stored thereon a signal as claimed in Claim 5. 
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ABSTRACT: 



This document gives a technical description of a multtchannel parametric 
audio coding system. The goal of this system is to describe an m-channel signal by an re- 
channel signal, with n<m, and parameters describing a spatial inaage in order to reconstruct 
the m-channel signal. 

Hg. 1 
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